Analysis of Parallel Structures in Patent Sentences, Focusing on the Head Words

نویسنده

  • Shoichi Yokoyama
چکیده

One of the characteristics of patent sentences is long, complicated modifications. A modification is identified by the presence of a head word in the modifier. We extracted head words with a high occurrence frequency from about 1 million patent sentences. Based on the results, we constructed a modifier correcting system using these head words. About 60% of the errors could be modified with our system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

Identification of BKCa channel openers by molecular field alignment and patent data-driven analysis

In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...

متن کامل

Acquisition of cleft structures in L1 and L2

The  present study aims at exploring the processing difficulty of cleft structures as a type of relative clause for EFL and Persian as  first language learners.The impact of head nouns with various functions as well as that of embedding on the processing of Persian and English cleft structures has been investigated in the present study.The participants  were 68  Iranian male and female students...

متن کامل

بازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری

Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...

متن کامل

A Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures

This paper presents a syntactic analysis method that first detects conjunctive structures in a sentence by checking parallelism of two series of words and then analyzes the dependency structure of the sentence with the help of the information about the conjunctive structures. Analysis of long sentences is one of the most difficult problems in natural language processing. The main reason for thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013